Automatic Invocation Linking for Collaborative Web-Based Corpora

نویسندگان

  • James Gardner
  • Aaron Krowne
  • Li Xiong
چکیده

Collaborative online encyclopedias or knowledge bases such as Wikipedia and PlanetMath are becoming increasingly popular because of their open access, comprehensive and interlinked content, rapid and continual updates, and community interactivity. To understand a particular concept in these knowledge bases, a reader needs to learn about related and underlying concepts. In this chapter, we introduce the problem of invocation linking for collaborative encyclopedia or knowledge bases, review the state of the art for invocation linking including the popular linking system of Wikipedia, discuss the problems and challenges of automatic linking, and present the NNexus approach, an abstraction and generalization of the automatic linking system used by PlanetMath.org. The chapter emphasizes both research problems and practical design issues through discussion of real world scenarios and hence is suitable for both researchers in web intelligence and practitioners looking to adopt the techniques. Below is a brief outline of the chapter. Problem and Motivation. We first introduce the problem of invocation linking for online collaborative encyclopedia or knowledge bases. An online encyclopedia consists of multiple entries. An invocation link is a hyperlink from a term or phrase in an entry representing a concept to another entry that defines the concept. It allows a reader easily “jump” to requisite concepts in order to fully understand the current one. We refer to the term or phrase being linked from as link source and the entry being linked to as link target. The problem of invocation linking is how to add these invocation links in an online encyclopedia in order to build a semantic concept network. State of the Arts. We review the state of arts for the invocation linking in current online encyclopedia and knowledge bases. The existing approaches can be mainly J. Gardner ( ) and L. Xiong Department of Mathematics and Computer Science, Emory University, 400 Dowman Dr. Atlanta, GA 30322 e-mail: [email protected]; [email protected] A. Krowne PlanetMath.org, 4336 Birchlake Ct. Alexandria, VA 23309 e-mail: [email protected] R. Chbeir et al., Emergent Web Intelligence: Advanced Information Retrieval, Advanced Information and Knowledge Processing, DOI 10.1007/978-1-84996-074-8 2, c Springer-Verlag London Limited 2010 23 24 J. Gardner et al. classified into: 1) manual linking where both the link source and link target are explicitly defined by the user (such as blog software), 2) semi-automatic linking where the link source are explicitly marked by the user but the link target is determined automatically (such as Wikipedia), and 3) automatic linking where both the link source and link target are determined automatically. We discuss the representative systems for each approach and illustrate their advantages and disadvantages. We will also review potential technologies such as web search and recommender systems and discuss their applicability for invocation linking. Automatic Invocation Linking. We advocate in this chapter the automatic linking approach as we believe that the manual and semi-automatic approaches are an unnecessary burden on contributors, and in addition, require continuous re-inspection of the entire corpus by writers or other maintainers for a growing and dynamic corpus. We discuss the challenges and design goals for developing such an automatic linking system including linking quality, efficiency and scalability, and generalization to multiple corpus. NNexus Approach. In particular, we present the NNexus system, an automatic linking system that we have developed as an abstraction and generalization of the linking component of PlanetMath (planetmath.org), PlanetPhysics(planetphysics.org), and other sites. We discuss a number of key features and design ideas of NNexus in addressing the challenges for invocation linking. NNexus provides an effective linking scheme utilizing metadata to automatically identify link sources and link targets. It achieves good linking quality with a classification-based link steering approach and an interactive entry filtering component. It achieves good efficiency and scalability by its efficient data structures as well as a mechanism for efficiently updating the links between entries that are related to newly defined or modified concepts in the corpus. Finally, its implementation utilizes OWL and has a simple interface, which allows for an almost unlimited number of online corpora to interconnect for automatic linking. Conclusions and Open Issues. We close the chapter by discussing a set of interesting issues and open problems for invocation linking.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase Detectives: A Web-based Collaborative Annotation Game

Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand annotators. One solution is to exploit collaborative work on the Web and one way to do this is through games like the ESP game. Applying this methodology however requires developing methods for teaching subjects the rules of the game and evaluating their contribution whil...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

QoS-based Web Service Recommendation using Popular-dependent Collaborative Filtering

Since, most of the organizations present their services electronically, the number of functionally-equivalent web services is increasing as well as the number of users that employ those web services. Consequently, plenty of information is generated by the users and the web services that lead to the users be in trouble in finding their appropriate web services. Therefore, it is required to provi...

متن کامل

Architectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service

In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...

متن کامل

Architectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service

In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010